

## Power management for digital processing apparatus

The invention relates to a device and method for power management for a digital processing apparatus.

The use of clocked mode digital logic integrated circuits, in particular

5 microprocessors, is commonplace in a wide variety of goods. It is desirable to reduce the power needed by such circuits, since this reduces the energy costs involved in operating the goods in which they are installed. In addition, excessive power dissipation with a circuit may cause a temperature rise that could shorten the life-span of the circuit. To reduce these problems, circuits have been devised in which certain parts are "turned off" when not in use.

10 In clocked mode digital logic circuits, the turning off state can be achieved by not supplying a clock signal to those parts of the circuit, which are not required a given time. Since the current (and therefore power) drawn by clocked digital circuits is a function of clock speed, and the clock speed of such circuits is increasing as technology advances, the ability to turn off the parts of a circuit which are not required is becoming more advantageous. Turning

15 large parts of the circuit on and off is not without problems; most important of which is the step variation in current the power supply has to provide as all the elements of that part of the circuit switch on or off simultaneously.

A number of solutions exist to aid the transition between low current supply and high current supply. These include a dummy load resistance is provided in parallel with the circuit to be turned on or off. The dummy resistance varies to gradually increase the power drawn from the source up to the power needed by external additional circuitry, at which point the circuitry is switched on and the dummy resistance removed. This scheme is applied in reverse when the circuitry is switched off and described in US patent 5,646,572 (IBM). Alternatively, as described in US patent 5,964,881 (AMD) the rate of the clock can be slowed at switch on to reduce the power needed by the additional circuitry then increased gradually over a number of clock cycles to bring the circuit up to operating speed. This scheme can also be applied in reverse when the circuitry is switched off. Until the clock speed has synchronized, no signal processing is possible.

Both the above-mentioned schemes require complex additional circuitry.

In addition to any provisions outlined above or otherwise, on-chip capacitors are needed to decouple power supply bounce and ground bounce and absorb the transient current demands produced by the switching on or off of clocked mode digital circuits. In the 5 case of integrated circuits such capacitors may be fabricated on the chip, which is expensive and consumes large die areas. Alternatively, off-chip capacitors may be used, but these are not as effective and also necessitate extra manufacturing steps. Off-chip decoupling results in supply currents through the IC package that will therefore contribute to RF radiation. It is therefore advantageous to minimize the off-chip capacitance required to absorb the transient 10 current demands by reducing the transients, but without introducing additional complex circuitry or otherwise seriously compromising the operation of the circuit as a whole.

It is an object of embodiments of the present invention to provide a method and device for reducing the step change in current required from a power supply as a clocked 15 digital circuit switches on or off which overcomes some of the problems associated with the prior art, whether referred to herein or otherwise. To this end, the invention provides a power management as defined in the independent claims. Advantageous embodiments are defined in the dependent claims.

According to a first aspect of the present invention, there is provided a method 20 of power management in a digital processing apparatus, the method comprising: receiving a free-running master clock signal; and from said master clock signal generating a plurality of sub-clocking signals, wherein said plurality of sub-clocking signals change from a power-up rest condition to a free running condition one at a time, following an initial switch-on of said digital processing apparatus.

According to a second aspect of the invention, there is provided a device for 25 power management for a digital processing apparatus, the device comprising: means for receiving a free running master clock signal and generating a plurality of sub-clocking signals, wherein said plurality of sub-clocking signals change from a power-up rest condition to a free running condition one at a time, following an initial switch-on of said digital 30 processing apparatus.

The device and method provide a convenient way of gradually starting up apparatus and thereby controlling supply current at switch-on.

Clocking data parts with separately generated clocks as set out in claim 3 provides a controlled increase in supply demand following switch-on and enables

prioritization of order of activation of data parts either based on power requirements or importance.

Each data processing part may comprise circuitry for processing a particular data bit or bits of a data word - particularly useful where the processing apparatus has a 5 pipeline arrangement.

Said digital signal processing apparatus has a particular maximum data width and conveniently said plurality of sub-clocking signals may correspond to said maximum data width.

In certain embodiments said plurality of sub-clocking signals may, during a 10 switch-off phase change from a free running condition to a rest condition one at a time. By employing such a "soft" switch-off, undesirable transient effects may be avoided.

For a better understanding of the invention, and to show how embodiments of the same may be carried into effect, reference will now be made, by way of example, to the 15 accompanying diagrammatic drawings in which:

Fig. 1 is a schematic circuit diagram of an embodiment of the present invention; and

Fig. 2 is a timing diagram for the Fig. 1 circuit.

20 Referring now to Fig. 1, there is shown an example of a device embodying the present invention. The device comprises a shift register 10 and logic circuitry 20. There is also shown in schematic format digital processing apparatus 30 that is to be managed by the device.

The shift register 10 comprises a plurality of interconnected flip-flops 12<sub>0</sub>, 12<sub>1</sub>, 25 12<sub>2</sub>, 12<sub>3</sub>. Where the digital processing apparatus to be controlled is a pipeline arrangement, the number of flip-flops supplied is determined by the pipeline depth. Each flip-flop 12<sub>0</sub>, 12<sub>1</sub>, 12<sub>2</sub>, 12<sub>3</sub> has a number of connections comprising clock input **CLK**, data input **D**, data output **Q**, a set input **ST** and a clear input **RES**.

The data input **D** of the first flip-flop 12<sub>0</sub> is connected to a control signal 30 **Cntrl**. The data output **Q** of the first flip-flop 12<sub>0</sub> is connected firstly to the data input **D** of the second flip-flop 12<sub>1</sub>, but also to provide a first enable signal **a** to the logic circuit 20. The second flip-flop 12<sub>1</sub> has its data output **Q** connected to the data input **D** of the third flip-flop 12<sub>2</sub> and also provides a second enable signal **b** to the logic circuit 20. The third flip-flop 12<sub>2</sub> has its data output **Q** connected to the data input **D** of the fourth flip-flop 12<sub>3</sub> and also

provides a third enable signal **c** to the logic circuit 20. The fourth flip-flop 12<sub>3</sub> has its data output **Q** connected to the logic circuit 20 so as to provide it with a fourth enable signal **d**.

The flip-flops 12<sub>0</sub>, 12<sub>1</sub>, 12<sub>2</sub>, 12<sub>3</sub> are connected via their respective reset inputs **RES** to a common clear line **CLR** and are also commonly clocked via their respective clock inputs **CLK**.

The logic circuit 20 comprises a plurality of **AND** gates 22<sub>0</sub>, 22<sub>1</sub>, 22<sub>2</sub> and 22<sub>3</sub>. Each **AND** gate 22<sub>0</sub>, 22<sub>1</sub>, 22<sub>2</sub>, 22<sub>3</sub> has a first input 24<sub>0</sub>, 24<sub>1</sub>, 24<sub>2</sub>, 24<sub>3</sub> and a second input 26<sub>0</sub>, 26<sub>1</sub>, 26<sub>2</sub>, 26<sub>3</sub> and an output **CLK<sub>0</sub>**, **CLK<sub>1</sub>**, **CLK<sub>2</sub>**, **CLK<sub>3</sub>**. The first inputs 24<sub>0</sub>, 24<sub>1</sub>, 24<sub>2</sub>, 24<sub>3</sub> of the **AND** gates 22<sub>0</sub>, 22<sub>1</sub>, 22<sub>2</sub>, 22<sub>3</sub> are connected, respectively, to receive the first to fourth enable signals **a**, **b**, **c**, **d**. The second inputs 26<sub>0</sub>, 26<sub>1</sub>, 26<sub>2</sub>, 26<sub>3</sub> of the **AND** gates 22<sub>0</sub>, 22<sub>1</sub>, 22<sub>2</sub>, 22<sub>3</sub> are commonly connected to clock line **CLK**. The outputs **CLK<sub>0</sub>**, **CLK<sub>1</sub>**, **CLK<sub>2</sub>**, **CLK<sub>3</sub>** are output to the digital processing apparatus 30, to form sub-clocks of individual data processing parts 30<sub>1</sub>-30<sub>3</sub> that receive data **DT**.

15

The operation of the circuit of Fig. 1 will now be described with reference to the timing diagrams of Fig. 2 which shows a master clock signal **CLK**, and timings relative to the master clock **CLK** for the enable signals **a**, **b**, **c**, **d**, output sub-clocking signals **CLK<sub>0</sub>**, **CLK<sub>1</sub>**, **CLK<sub>2</sub>**, and **CLK<sub>3</sub>** and a supply current **I<sub>suppl</sub>**.

20

Referring now to Fig. 1, an initial state of the shift register 10 will be considered.

At power up of the system, a power on reset function sends a signal via the clear line **CLR** to reset terminals **RES** of the individual flip-flops 12<sub>0</sub> to 12<sub>3</sub> of the shift register 10, so as to initially load the shift register 10 with logical 0's.

25

The reset function is used during start-up. During power-up, the reset line **CLR** is kept low, to ensure a non-operative circuit, i.e. a low supply current, by clearing the outputs of all flip-flops. In this way, none of the circuits normally driven by the clock receive a clock signal. Thereafter, when data processing is required, a control device is arranged to set the data input **D** of the first flip-flop 12<sub>0</sub> to be a logic high.

30

According to the timing diagram, when the first clock pulse after the power on reset is applied to the **CLK** inputs of the flip-flops 12<sub>0</sub> to 12<sub>3</sub>, the logical 1 at the **D** input of flip-flop 12<sub>0</sub> is clocked through to the output **Q** so as to send signal **a** high. It will be evident that as subsequent clock pulses are input to the **CLK** terminals of the flip-flops 12<sub>0</sub> to 12<sub>3</sub> of the shift register 10, the register will, in four cycles of the clock, change the states of the

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

respective flip-flops  $12_0$  to  $12_3$  from 0000 to 1000 to 1100 to 1110 to 1111. Thereafter, the shift register 10 will be full of logic 1's during the normal subsequent operations of the digital signal processing apparatus of which this circuit forms a part.

The outputs **a**, **b**, **c**, **d** of the shift register 10, as explained above, progress 5 from a logic 0 state at initial turn-on of the apparatus to a logical 1, and then stay at that logic 1 state, the first signal **a** rising one clock cycle before the second signal **b**, which in turn rises one clock cycle before the third signal **c**, which in turn rises one clock cycle before the fourth signal **d**.

Enable signals **a** to **d** form validating inputs to AND gates  $22_0$  to  $22_3$  of the 10 logic circuit 20.

The enable signals **a** to **d** are fed to the first inputs  $24_0$  to  $24_3$  of the AND gates  $22_0$  to  $22_3$ , and the master clock signal **CLK** is fed to the second inputs  $26_0$  to  $26_3$ .

Sub-clocking signals **CLK<sub>0</sub>** - **CLK<sub>3</sub>** are produced by outputs of the AND gates  $22_0$  through  $22_3$  as shown in Fig. 2.

In the above fashion, it can be seen that a progressive loading of logic 1's through the register 10 ensures that proportionately with an applied signal to be processed by the controlled digital processing apparatus, a clock signal can be applied to the pipeline circuitry.

The circuitry as described above is of particular use when data is being processed in a serial fashion and when the order of the data bits proceeds in a predetermined manner. It is particularly of use in pipeline processing where a dedicated data processing part 30<sub>1</sub>-30<sub>3</sub> of processing apparatus 30 is provided for each individual data bit of a data word. In such cases, individual processing parts 30<sub>1</sub>-30<sub>3</sub> may, at switch on, receive individual 25 respective clocking signals **CLK<sub>0</sub>** to **CLK<sub>3</sub>** such that a first bit of received data would have its processing part clocked by sub-clocking signal **CLK<sub>0</sub>**, a second would have its clock signal provided by sub-clocking signal **CLK<sub>1</sub>**, a third by sub-clocking signal **CLK<sub>2</sub>** and a fourth by sub-clocking signal **CLK<sub>3</sub>**. In this manner, at turn on, the individual processing parts are effectively activated one at a time. In complex pipeline structures, there may be a 30 significant power drain from each process stream as it is clocked and such sequential turn on enables the supply current  $I_{suppl}$  of the overall apparatus to slowly ramp up to its full value. By allowing such slow ramping, the problems of the prior art are overcome or reduced to a certain extent.

It will be evident to the man skilled in the art that the circuitry may also be provided so as to provide a controlled turn off to the system so as to avoid any problems which might occur if the supply current were to suddenly be reduced. This may be achieved by maintaining the normal condition of each output of the register 10 being at logic 1 until all 5 data desired to be processed has been done and thereafter loading the register progressively with logic 0's. In other words, when the last useful data has passed the data entry point of the pipeline, the control line **Cntrl** may be brought low and 0's fed into the register 10 to give a slow decay of supply current by stopping the sub-clocks **CLK<sub>1</sub>** through **CLK<sub>3</sub>** one at a time.

Also shown in Fig. 1 is a set line **ST**. This set function may be utilized by 10 control circuitry to force a high output condition at each output of the register 10 simultaneously, so as to avoid the gradual system waking up period described. This set feature can be utilized when the digital processing apparatus in question needs to be tested and in such conditions a test may be carried out with the minimum of delay.

15 In a **JTAG** test mode instantaneous data processing can be carried out where various registers in the pipeline have data patterns fed into them and in which data is NOT clock serially.

It should be appreciated that under normal operation (i.e. beyond the start-up 20 phase) of the digital processing apparatus the reset line **CLR** should never be used as it will cause all sub-clocks to shut down at once and therefore cause processing glitches.

It will also be evident that it may not be required that the sequential turn on or off of the sub-clocks is made in order of data bit receipt as, during turn on, there may be one or more data cycles in which the overall apparatus for which the circuitry of the present invention is intended, takes time to stabilize. Therefore, synchronizing the clocking signals 25 with the arrival of data bits is not essential as sequential turn on of the different processing streams may therefore occur during a short wake up cycle of the apparatus, so that by the time valid data arrives all of the different data processing streams are receiving clock signals.

It will further be appreciated by the man skilled in the art that although a specific shift register layout and specific logic circuit layout has been shown, equivalents 30 circuitry may replace those elements shown in the Figures. For instance, the logic circuitry may further include buffering elements, may be comprised of NAND gates or other processing logic, whereas the shift register may be configured differently to the layout shown in Fig. 1. It should thus be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative

embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the device claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

“*It is a good thing to be kind to people, but it is better to be kind to animals.*”